COHERENS workshop #2

Florian Ricour (ECOMOD)
Ludovic Lepers (MFC)

Part I

JupyterHub on ECMWF

A clear procedure to get there

  • Need an ECMWF account
  • Access to local terminal
    1. tsh login --proxy=jump.ecmwf.int
    2. ssh -X hpc-login
  • Tutorial to connect to JupyterHub
  • Multiple server options (Profile, CPU number, session duration, …)

Let’s try JupyterHub!

File System Features
File System Features Quota
HOME Backed up 10 GB
PERM No back up 500 GB
HPCPERM No back up 100 GB*/1 TB
SCRATCH No back up 50 TB*/2 TB
TMPDIR Deleted at the end of session/job 3 GB by default
* for users without HPC access such as ECS

Conclusion on Part I

  • GUI is more user-friendly than a cold heartless terminal
  • Take advantage of the HPC filesystem at your disposal
  • Recent service (April 2024), you will look so damn cool !
  • Session up to 7 days, just go the URL and you’re back online
  • Use it for other tasks than model simulations
  • Save time from heavy downloads/uploads from/to Sharepoint (Erk)
  • Preserve your laptop and run big python/R codes on ECMWF
  • You want to go home but need to wait for a script to finish ? \(\rightarrow\) ECMWF
  • I like it and I hope you’ll use it !

Short break

Part II

Artificial Intelligence, how smart is it?

Who hasn’t used ChatGPT here?

  • chatGPT (you know it)
  • Claude.ai (20$/month well spent)
  • GitHub Copilot (code completion)
  • Perplexity (search engine)
  • NotebookLM (nice podcasts)
  • DeepL (translation)
  • Many more

Artificial intelligence

AI is a vague terminology

Artificial Neural Network (ANN)

A simple ANN - The Perceptron

Based on an artificial neuron called threshold logic unit (TLU)

\[ \text{heaviside}(z) = \begin{cases} 0 & \text{if } z < 0 \\ 1 & \text{if } z \geq 0 \end{cases} \]

Perceptron composed of one or more TLUs

Every TLU connected to every input = fully connected layer or dense layer

  • \(h_{\mathbf{W},\mathbf{b}}(\mathbf{X}) = \phi(\mathbf{X}\mathbf{W} + \mathbf{b})\)
  • \(\mathbf{b} = \text{bias vector}, \text{one value per neuron}\)
  • \(\phi = \text{activation function}\)

\(\rightarrow\) backpropagation algorithm (see after)

Multilayer Perceptron - XOR example

A B A XOR B
0 0 0
0 1 1
1 0 1
1 1 0

ANN with deep stack of hidden layers = deep neural network

Tweaking parameters to minimize the cost

Famous optimization algorithm - gradient descent (iterative process)

  • \(\boldsymbol{\theta} = \text{parameter vector}\)
  • \(\eta = \text{learning step} = \text{learning rate}\)
  • \(\boldsymbol{\theta}^{(\text{next step})} = \boldsymbol{\theta} - \eta \nabla_{\boldsymbol{\theta}} \text{Cost}(\boldsymbol{\theta})\)
  • \(\text{If Cost}(\boldsymbol{\theta}) = \text{RMSE}(\boldsymbol{\theta})\)
    • \(\text{min Cost}(\boldsymbol{\theta}) = min \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2}\)

  • \(\boldsymbol{\theta} = \text{parameter vector}\)
  • \(\eta = \text{learning step} = \text{learning rate}\)
  • \(\boldsymbol{\theta}^{(\text{next step})} = \boldsymbol{\theta} - \eta \nabla_{\boldsymbol{\theta}} \text{Cost}(\boldsymbol{\theta})\)
  • \(\text{If Cost}(\boldsymbol{\theta}) = \text{RMSE}(\boldsymbol{\theta})\)
    • \(\text{min Cost}(\boldsymbol{\theta}) = min \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2}\)

Have you ever seen an ANN in action?

TensorFlow Playground

Now that we know you’ve all used chatGPT

  • chatGPT (you know it)
  • Claude.ai (20$/month well spent)
  • GitHub Copilot (code completion)
  • Perplexity (search engine)
  • NotebookLM (nice podcasts)
  • DeepL (translation)
  • Many more

Large Language Models (LLMs)

Part III

LLMs, what’s all the fuss about?

LLMs are huge neural networks

  • Billions of parameters (e.g. GPT3 - 175 billions)
  • Specialized in language processing
  • Most famous ones (e.g. GPT3) are proprietary (i.e. unknown weights)
  • Some models have open weights, in contrast to being fully open source

Converting text to machine data

  • Tokenization - splitting the text into tokens
  • Vectorization - each token receives a vector (e.g. GPT3 \(\rightarrow\) 12 288 dimensions)
  • Vector with semantic meaning (e.g. King - Man + Women = Queen)

Transformers - Attention is all you need

  • GPT - Generative Pretrained Transformers
  • A transformer is a neural network with many layers
  • Context window - a sequence of tokens used as model input
  • The model outputs a token
  • All tokens pass through the neural network and are modified based on the other tokens
  • black dog, the dog vector will be modified to account for the fact that the dog is black

Trained for probability, not truth

  • The output vector is converted into a probability distribution, from which the next token is selected
  • Each generated token becomes part of the new context window (i.e. continuous text generation)
  • Model training - Tries to predict the next token then backpropagation steps in
  • Trained to be the most probable, not the most accurate
  • Knowledge learned during training (constant, stored in the model parameters)
  • Knowledge from the context window (different at each interaction)

Interaction with your new AI companion

  1. There is a hidden system prompt that explains to the model that it must simulate a conversation
  2. Then a sentence is given by the assistant (e.g. Claude ❤️)
  3. A sentence can then be given by the user, the prompt
  4. Repeat 2 and 3 until you are satisfied

Transforming a LLM into a chatbot

  • The model is fine tuned with reinforcement learning human feedback
  • Human feedback - rating model responses as good or bad (see later Bing Chat)
  • Safety training through feedback reduces harmful outputs but may limit model capabilities
  • Models are optimized to generate responses that appear convincing to humans
  • Models are instructed to simulate assistant-human conversations using embedded system prompts
  • Prompt injection risks (i.e. jailbreak) - when model safety controls are bypassed

Part IV

Examples and best practices

The science of prompting

The less the model has to guess the better

Non exhaustive list

Write clear instructions
  • Option 1
  • Option 2
  • Option 3
  • Write clear instructions
  • Provide reference text and/or code
  • Split complex tasks into simpler subtasks
  • Give the model time to think by using a chain of thought (i.e. step-by-step reasoning)
  • Tell the model what to do, rather than what not to do
  • Context matters - start a new chat if you change topics

Additional prompting tips

  • Testing multiple models if needed (Claude 3.5 Sonnet, Claude 3 Opus, GPT-4o, GPT-o1, …)
  • Think about your interaction with the assistant (e.g. text or code or even screenshot as input)

Images, audio and notes

Image generation from text or image

Audio generation from text

  • NotebookLM - podcast
    Century-scale carbon sequestration flux throughout the ocean by the biological pump
  • Suno, udio - music
    Based on COHERENS's documentation

Getting the info that matters the most

with NotebookLM, powered by Gemini 1.5

  • Upload your sources (PDFs, websites, YouTube videos, Google Docs/Slides, …)
  • NotebookLM will summarize them and make connections (suggest questions) between topics
  • Exact quotes from sources (relies only on uploaded documents)
  • Multimodal - can analyse images and/or plots as well
  • Confidential - data not used to train the model
  • Helpful to summarize papers, anticipate questions, prepare presentations, …

Coding can be fast, really fast

It feels like coding with 4 hands

Shiny app built from scratch hello

Amazingly fast.

Lunch time !